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Abstract — Quantitative characterization of randomly roving 
agents in Agent Based Intrusion Detection Environment (ABIDE) 
is studied. Formula simplifications regarding known results 
and publications are given. Extended Agent Based Intrusion 
Detection Environment (EABIDE) is introduced and quantitative 
characterization of roving agents in EABIDE is studies. 

I. Introduction 

Wireless sensor networks (WSN) are composed of thou- 
sands of nodes that are spatially distributed in an unattended 
area usually without prior knowledge of the network topology. 
They act as a real time environmental monitoring tool by 
sensing and reporting environmental data to the base station, 
which usually happens in a multi-hop way. In many WSN 
applications, like hostile area monitoring or when WSN acts 
as an intrusion detection system for a building, the security 
of the network is crucial. Especially when network nodes 
are deployed in an unattended area an adversary can have 
a physical access to them which will allow him to read, 
modify or erase the content of a node. In some deployments 
node replication attack also becomes feasible. The aim of an 
intrusion detection system (IDS) for those networks is to act as 
the second defence line against network attacks that preventive 
mechanisms fail to address 0. An Intrusion detection system 
for a network is a system that dynamically monitors the events 
taking place on a network and decides whether these events 
are symptoms of an attack or constitute a legitimate use of 
the system [7]. Comprehensive surveys on IDS for WSN are 
presented in (TJ, fl2l . 

Agent based intrusion detection systems became popular 
because of their scalability, reconfigurability and survivability 
10, 0, El, 0, 0. It is more difficult for an attacker to 
deal with such IDS as they do not have defined structures 
and are not predictable. In this work we discuss an agent 
based intrusion detection system called ABIDE (Agent Based 
Intrusion Detection Environment) lfl4l . (T3), ifTTl which uses 
autonomous software agents for intrusion detection in com- 
puter networks. In ABIDE autonomous agents are moving 
randomly in a network along communication links and record- 
ing/calculating a unique information on randomly selected 
nodes. An example of such unique information can be a check- 
sum of the operating system running on a node, which can 
help to understand whether it has been modified or not. Later 



each agent passes the data it collected to a special agent which 
combines the data received from various agents and tries to 
determine weather an intrusion took place or not (more details 
on ABIDE are given in Section 77 1. IfTTl tries to calculate the 



number of agents required by ABIDE for detecting intrusions 
in a given size network with a given probability. The formulas, 
that give the relation between the number of agents and the 
probability of an intrusion to be detected, presented in IfTTl . 
such as Formula (JTJ), are complex and unobservable and their 
simplifications or approximations are of interest. By this same 
reason IfTTl considers a computer simulation instead of using 
the Formula ([TJ, to understand the typical number of agents 
necessary to retrieve the required information in a network. 
Our work tends to prove simple formulas analytically, for 
the same numerical characteristics of ABIDE, which can be 
used to understand the relations between the number of agents 
and the amount of information that can be gathered by them, 
without considering a software simulations. We also propose 
the extended version of ABIDE, called EABIDE and consider 
the same quantitative characteristics for it. As a result we 
get formulas representing the relation between the number of 
roving agents in EABIDE and the amount of information that 
can be gathered by them in terms of Stirling numbers of the 
second kind. Known asymptotic estimates for Stirling numbers 
of second kind can further be applied to get more compact 
approximations 0, 0, [16|. 

II. Agent Based Intrusion Detection Environment 
(ABIDE) 

Consider a network where each node has a software agent 
hosting environment (i.e. software agents can move into a 
node perform some action and leave.). ABIDE [11] uses four 
different kinds of agents to organize intrusion detection and 
correction in the system. 

1) A Data Mining Agent (DMA) roams around in a net- 
work (i.e. randomly chooses a host node and moves 
there) and acquires environmental information from 
nodes. DMA is lightweight and uses simplest mining al- 
gorithms. For example DMA may calculate a checksum 
of the operating system that runs on a host node, and if 
it decides that the value of the checksum is suspicious 
it can keep the value and curry on for further analysis. 



2) A Data Fusion Agent (DFA) roams around or is located 
on the base station. It receives the data collected by 
various DMAs and builds a larger picture of events 
from this data. As the DFA has a combined data it 
can apply classical intrusion detection techniques to 
determine whether an intrusion took place or not. Of 
course the power of the DFA depends on the quantity 
of information received from DMAs. 

3) Nodes that have been identified as suspicious by DFA 
are further visited by a Probe Agent (PA), sent by DFA, 
which performs a test on a host node to confirm the 
intrusion. 

4) Once the intrusion is confirmed by a PA a Corrective 
Agent (CA) can be dispatched by a DFA to take actions. 

We tend to answer to the following question. What is the 
probability of identifying intrusions in a network of a given 
size with the set of given DMAs in a presence of a single 
DFA, where DFA needs information from at least t distinct 
nodes ifTTI in order to be able to determine whether there is 
an intrusion or not. Further this can be used to calculate the 
number of DMAs required for identifying intrusions in a given 
network with a given probability. 

Formally the problem we consider is the following. Given 
a set of k DMAs which roam around in a network of n nodes. 
Each DMA has a storage where it can keep a data from m 
different nodes. DMA returns to DFA as soon as it acquires 
a data from exactly m randomly chosen distinct nodes. Note 
that when a DMA moves into a node it is not obliged to 
take actions there, the node can be used as intermediate 
hop for roaming, this way randomness of the visited nodes 
(nodes where a data has been collected) can be guaranteed. 
It is required to calculate the probability P k (n,m,t) of DFA 
having data from exactly t distinct nodes. Note that each DMA 
gathers a data from to distinct nodes but the data gathered by 
two different DMA may intersect. 1 1 1 ] provides the following 
formula 

P k (n,m,t) = 
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Of course ([I]) is unobservable and simplifications or ap- 
proximations are of interest. By this same reason ([I]) con- 
siders computer simulations for approximating the value of 
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Fig. 1 . Matrix representation of visited nodes. 



P k (n,m,t). Below we present formula simplifications that 
allow to compute the exact value of P k (n,m,t) without 
software simulations. 

III. Coverage Characterization of Roving Agents 
IN ABIDE 

Consider a set N = {vi, ...,«„} of n nodes and subsets 
Si C N,i = 1, k, where subset Si corresponds to the set of 
nodes visited by agent and is of size m (here we say a node 
is visited by agent i if i collected a date from that node, i.e. 
nodes that were used as intermediate hops for roaming are not 
considered as visited). We consider a probability distribution 
scheme over N. As the nodes visited by agents are random the 
subsets Si,i = l,...k will be independent and equiprobable. 
Having in total C™ subsets of size m the probability for one 
of them to acquire is l/C™. We are interested in probabilistic 
characteristics of union U^ =1 Si and its size. In particular, what 
is the probability that the union of those subsets contains 
exactly t elements. 



P k (n,m,t) = Pr 



(2) 



Consider a matrix A kxn = {ay} (Figure [lji where 

ciij = < 

otherwise 



(3) 



From | S» | = m it follows that each row of matrix A will be 
composed of exactly m Is and n—m Os. A column j of matrix 
A represents the node Vj and it composed of zeros alone, if 
and only if non of the k agents visited the node Vj, i.e. non of 
the subsets Si contains Vj. Therefore the union U^ =1 Si will 
be composed of exactly t distinct elements if and only if A 
contains exactly n — t columns composed of Os alone and all 
the other columns contain at least one 1. It is obvious that 
the number of possibilities to get information from exactly 
t out of ri nodes, with k agents equipped with a memory 
of size to is given by the number of A matrices discussed 
above. Denote the number of k x t sub-matrices Q (Figure 
[2]i that have exactly to 1 on each row and have at least one 
1 on each column by Q(k,m,t). Then the number of k x n 

1 later in paper by saying agent we mean DMA 



"Jt+1 



S, 














-U 





Fig. 2. Sub-matrix Q. 

matrices with exactly m ones on each row and with exactly 
n — t columns with no Is will be 



(4) 



where C* stands for the number of possibilities to pick t out 
of n nodes (columns) and Q(k,m,t) stands for the number 
of possibilities to cover all the t nodes by k agents equipped 
with a memory of size m, 

Q(k,m,t) can be calculated by inclusion-exclusion princi- 
ple. First, over k x t matrices we take all the matrices with 
exactly m Is on each row, then we remove all the matrices 
that have at least one column initially filled in with Os (such 
matrices do not obey the conditions we require), then we add 
matrices with at least 2 columns filled in with Os and so on. 
The formula representation of related quantities is 



Q(k,m,t) 
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We have proven 
Theorem 1. 



p / .s C* ■ Q(k,m,t) 
Pk(n,m,t) = — k 

{C raf 



(6) 



Proof: The proof follows from Q, ^ and the fact that 
the number of k x n matrices with exactly m Is on each row 

is (c;») fe . ■ 

First of all here we receive a real simplification of ([!]). The 
formula received is still complex, but it might be easily calcu- 
lated and the applied Markov inequality may give asymptotic 
estimates of t-subset probabilities ifTUl . 

Another important characteristic, the mean value of subset 
size t, might be computed as: 

min(fcm,n) 

^2 t ■ P k (n,m,t) = 
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IV. Extended Agent Based Intrusion Detection 
Environment (EABIDE) 

We generalize the intrusion detection system proposed in 
ifTTIl by allowing data mining agents (DMA) to collect a 
redundant data, i.e. in contrast with the original version of 
ABIDE, where each DMA collects data from m randomly 
chosen distinct nodes, here DMA is allowed to have more 
than one instance of the same data in his memory (i.e. on each 
visit of the same node data might be calculated and stored). 
DMA do not store several copies of the same data in purpose, 
this can be unavoidable in networks where network nodes are 
indistinguishable from DMA point of view. The later might be 
required by the security system of the network (e.g. if nodes 
use randomized and encrypted IDs DMA can not recognize 
the node visited before as it will have different ID, so the data 
collected from the same node during two different visits will 
be indistinguishable). As a result when the memory of a DMA 
is full it will contain data from 1 < I < m distinct nodes in 
contrast with m in case of ABIDE. A data fusion agent (DFA), 
having access to security schemes deployed in the networks, 
can sort out the data received from a DMA, discard redundant 
data and keep the I pieces of distinct data. 

In Extended Agent Based Intrusion Detection Environment 
(EABIDE) we are interested in the same question as before. 
What is the probability of identifying intrusions in a 
network of a given size with the set of given DMAs 
in a presence of a single DFA, where DFA needs 
information from at least t distinct nodes in order to 
be able to determine whether there is an intrusion 
or not. Further this can be used to calculate the 
number of DMAs required for identifying intrusions 
in a given network with a given probability. 

Formally the problem we consider is the following. Given 
a set of k DMAs which roam around in a network of n nodes. 
Each DMA has a storage where it can keep m pieces of data. 
DMA returns to DFA as soon as it acquires m pieces of data, 
from randomly chosen nodes (from DMA point of view all the 
m pieces of data will be different). Note that when a DMA 
moves into a node it is not obliged to take actions there, the 
node can be used as intermediate hop for roaming, this way 
randomness of the visited nodes (nodes where a data has been 
collected) can be guaranteed. It is required to calculate the 
probability P£(n,m,t) of DFA having data from exactly t 
distinct nodes. The difference with the ABIDE is that not only 
the data gathered by different DMA may intersect but also the 
data in the memory of a single DMA may be redundant. 

V. Coverage Characterization of Roving Agents 
in EABIDE 

Consider a set N = {vi, v„} of n nodes and subsets 
S* C N,i — 1, ...,k, where subset S* corresponds to the set 
of distinct nodes visited by agent i (after removing repeating 
nodes, i.e. a set of nodes by DFA point of view) and 1 < 
\S*\ < m (here we say a node is visited by agent i if i 
collected a date from that node, i.e. nodes that were used as 



intermediate hops for roaming are not considered as visited). 
We consider a probability distribution scheme over N. We are 
interested in probabilistic characteristics of union U^ =1 S* and 
its size. In particular, what is the probability that the union of 
those subsets contains exactly t elements. 



P% (n, m, t) = Pr 



1=1 



(8) 



This time the matrix B 1 = {bjj} corresponding to 
subsets ST will be 



1 if Vj <= Si 
otherwise 



(9) 



From 1 < \S*\ < m it follows that on each row of 
matrix B there is at least 1 and at most m Is and the 
rest is filled by zeros. A column j of matrix B represents 
the node Vj and it composed of zeros alone, if and only 
if non of the k agents visited the node Vj, i.e. non of the 
subsets S* contains vj. Therefore the union U^ =1 S* will be 
composed of exactly t distinct elements if and only if B 
contains exactly n — t columns composed by Os alone and 
all the other columns contain at least one 1. It is obvious that 
the number of possibilities to get information from exactly 
t nodes, of network of n nodes, with k agents that fetch 
1 < < m unique data each is given by the number of 
B matrices discussed above. Denote the number of k x t sub- 
matrices R, that have \ < U < m ones on the i-th row (for 
all the possible Z,*) and have at least one 1 on each column, 
by R(k,m,t). Then the number of B matrices will be 



C l n ■ R(k,m,t) 



(10) 



where C l n stands for the number of possibilities to pick t out 
of n nodes (columns) and R(k,m,t) stands for the number 
of possibilities to cover all the t nodes by k agents. 

For calculating the number of B matrices first we prove 
the following lemma which shows the similarities between 
schemes ABIDE and E ABIDE. 

Lemma 1. The probability of covering exactly t out ofn nodes 
with one agent having memory of m units in EABIDE scheme 
is equal to the probability of covering exactly t out of n nodes 
with m agents having memory of 1 unit in ABIDE scheme. 

Pf(n,m,t) = P m (n, 1, f ) 

Proof: The proof is simple. Having in mind that at any 
point of time each node has the same probability to be visited 
by an agent in EABIDE scheme (even those nodes that have 
already been visited), each cell of the agent's memory can be 
considered as an individual agent having a memory of size 1 
which leads to m agents with one unit of memory in ABIDE 
scheme. ■ 

Corollary 1. P£(n, m, t) = P km (n, 1, t) 

Proof: The proof is similar to the proof of Lemma [T] ■ 



Theorem 2. 

t-i 

R(k,m,t) = Q{km,l,t) = ^{-iyci ■ (t - i) mk (11) 
Proof: The proof follows from Theorem [l] and Corollary 

m ■ 

Corollary 2. 



C l n ■ R(k,m,t) 



(12) 



Proof: The proof follows from Theorem [2] and Corollary 



Finally, we note that R(k, m, t) has equivalent presentation 
in terms of Stirling numbers of the second kind [4] 



S{N,K) = -\ i Y.{-iy& K (K-j) N . 



(13) 



3=0 



Formally in the formula of R(k, m, t) we may add the zero 
term for i = t, and then we receive 



R(k,m,t) = t\S{mk,t) 



(14) 



Stirling number of the second kind S(N, K) is the number 
of ways to partition a set of N objects into K non-empty sub- 
sets. Existing asymptotic estimates for them (4), (9), lfl6ll allow 
to get simple approximations for R(k,m,t) and therefore for 

Pj£(n,m,t). 

The following theorem, which is the final postulation of this 
paper, can be formulated. 



Theorem 3. 



P£(n,m,t) 



C%-t\S(mk,t) 



(15) 



VI. Conclusion 



In its current state the intrusion detection system called 
ABIDE ifTTI considers software simulations to understand the 
number of data mining agents required for identifying intru- 
sions in a system with a given probability. In the current paper 
we gave formulas that allow to compute this number analyti- 
cally. Further we considered the extended version of ABIDE 
(EABIDE) and proved formulas for the same quantitative 
characteristics. Formulas for EABIDE are achieved in terms of 
Stirling numbers of the second kind [4|, |9|, [16|, which allows 
to obtain asymptotic estimates and further simplifications for 
quantitative characteristics of EABIDE. In the future it will 
be interesting to consider the same quantitative characteristics 
analytically for more general cases of ABIDE and EABIDE 
schemes with more than one DFA. 
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